Unconstrained Tight Structure Extraction Using Voronoi Tesselation on Document Images
نویسندگان
چکیده
Document structure is the intermediary result obtained through page segmentation, which is used in the analysis of the document image. The structure serves the purpose of extracting the shape of the document from paragraph up to character level in a hierarchical exploratory methodology for understanding the layout structure of the document image. The extracted layout forms a dominant feature which plays a vital role in categorization, equivalence, ranking and retrieval of documents without reading. The main theme of this paper is to obtain the structure of the document, where entities possess unconstrained layout due to different contents. The Voronoi tessellation on document image with the points of interest being every high pixel in the image helps in recognizing the structure of the entities in the document image. New techniques known as Spring Force is being implemented to grab the point of interest present in the unbounded region and the exterior points which frame the exterior boundary of the entities in the document image to obtain a tight structure. The result of spring force technique is not only obtaining a tight structure for entities, but also to obtain the structure of the blank space in the document image. This method works successfully on all types of document images with Non-Manhattan layouts with
منابع مشابه
Constrained Learning Vector Quantization or Relaxed k-Separability
Neural networks and other sophisticated machine learning algorithms frequently miss simple solutions that can be discovered by a more constrained learning methods. Transition from a single neuron solving linearly separable problems, to multithreshold neuron solving k-separable problems, to neurons implementing prototypes solving q-separable problems, is investigated. Using Learning Vector Quant...
متن کاملAutomatic Road Detection and Extraction From MultiSpectral Images Using a New Hierarchical Object-based Method
Road detection and Extraction is one of the most important issues in photogrammetry, remote sensing and machine vision. A great deal of research has been done in this area based on multispectral images, which are mostly relatively good results. In this paper, a novel automated and hierarchical object-based method for detecting and extracting of roads is proposed. This research is based on the M...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملWord Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcom...
متن کامل